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Abstract 

■ Using a characterizing equation for the Beta distribution, Stein's method is applied 
to obtain bounds of the optimal order for the Wasserstein distance between the distri- 

| bution of the scaled number of white balls drawn from a Polya-Eggenberger urn and 

PLh " its limiting Beta distribution. The bound is computed by making a direct comparison 

■ between characterizing operators of the target and the Beta distribution, the former 
' derived by extending Stein's density approach to discrete distributions. 

1 Introduction 

>: 

The classical Polya-Eggenberger urn at time zero contains a > 1 white and /3 > 1 black balls, 
and at every positive integer time a ball is chosen uniformly from the urn, independently of 
t— i ' the past, and replaced along with an additional ball of the same color. With C indicating 

■ distribution, or law, and — >d indicating convergence in distribution, it is well known, see 

for instance, that if S n is the number of additional white balls added to the urn by time 

■ n = 0, 1, 2, . . ., then as n — >■ oo 

>: s 

£(W n )^ d B(a,P) where W n 



n 



Here, for positive real numbers a and we let B(a, 0) denotes the Beta distribution having 
density 

Pfa<x,P) = B( ^ a m l{*e[o,i]}, (1) 

where B(a,(3) = T(a)T((3)/T(a + 0) is the Beta function as expressed in terms of the 
Gamma function T(x). Using Stein's method we derive a bound of the optimal 1/n rate on 
the Wasserstein distance between the distribution of W n and its limiting Beta distribution 
in terms of n, a, /3 and small explicit constants. 
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Beginning with the introduction by Stein [19] of a 'characterizing equation' type method 
for developing bounds in normal approximation, to date the method has been successfully 
applied to a large number of the classical distributions, including the Poisson, Gamma, 
Negative Binomial and Geometric. Here we further extend the scope of Stein's method by 
including the Beta distribution in its scope by focusing on its role as the limiting distribution 
of the fraction of white balls added to the Polya-Eggenberger urn. 

Urn models of the classical type, and generalizations including drawing multiple balls, 
or starting new urns, have received considerable attention recently; see for example [1], [3] 
and [7]. Interest has partly been sparked by the ability of urn models to exhibit power-law 
limiting behaviour, which in turn has been a focus of network analysis, see for example [9] 
and [17J. Connections between urn models and binary search trees are clearly explained in 
|16j . In particular, the initial state of the Polya-Eggenberger urn can be viewed as a rooted 
binary tree having a white and (3 black leaves, or external nodes. At every time step one 
external node is chosen, uniformly, to duplicate, yielding a pair of leaves of the same colour. 
That is, the chosen external node becomes an internal node while two external nodes of the 
chosen colour are added. The number of white leaves of the tree at time n clearly has the 
same growth rule as the number of white balls added in the Polya-Eggenberger urn, and 
hence the same distribution. 

2 Characterizing equations and generators 

Stein's method for distributional approximation is based on a characterization of the target 
approximating distribution. For the seminal normal case considered in [19], it was shown 
that a variable Z has the standard normal distribution if and only if 



for all absolutely continuous functions / for which these expectations exist. If a variable 
W has an approximate normal distribution, then one expect it to satisfy (TSJ) approximately. 
More specifically, if one wishes to test the difference between the distribution of W and 
the standard normal Z on a function h, then instead of computing ~Eh(W) — Nh, where 
Nh = ~Eh(Z), one may set up a 'Stein equation' 



for the given h, solve for f{w), and, upon replacing w by W in fl3]), calculate the expectation 
of the right hand side by taking expectation on the left. At first glance it may seem that 
doing so does not make the given problem any less difficult. However, a number of techniques 
may be brought to bear on the quantity E[/'(W) — W/(W)]. In particular, this expression 
contains only the single random variable W, in contrast to the difference of the expectations 
of h(W) and h(Z), which depends on two distributions. 

To obtain our result, we actually compute the distance between the distribution of the 
fraction of white balls in the Polya-Eggenberger Urn and the Beta by comparing the op- 
erators that characterize them. Our approach in characterizing the urn distribution stems 
from what is known as the density method; see for instance, [20], [18] or [I] Section 13.1. In 
particular, recognizing the — x in ([3]) as the ratio of 4>'(x)/(j)(x) where <f>(x) is the standard 



nZf(Z)} = E[f(Z)] 



(2) 



f'(w)-wf(w) = h{w)-Nh 



(3) 
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normal density, one hopes to replace the term —x by the ratio p'(x)/p(x) when develop- 
ing the Stein equation to handle the distribution with density p(x), and to apply similar 
reasoning when the distribution under study is discrete. Use of the density method in the 
discrete case, followed by the application of a judiciously chosen transformation, leads to the 
characterization of the Polya-Eggenberger Urn distribution given in Lemma 12.11 

Another approach to construct characterizing equations is known as the generator method. 
A number of years following the publication of [19], the relationship between the character- 
izing equation ([2]) and the generator of the Ornstein-Uhlenbeck process 

Af(x) = f"(x)-xf(x), 

of which the normal is the unique stationary measure, was recognized in [2], who noted that 
that in some generality the process semi-group may be used to solve the Stein equation fl3]). 

Given the relation between certain Stein characterizations and generators, when extend- 
ing Stein's method to handle a new distribution it is natural to consider a stochastic process 
which has the given target as its stationary distribution. 

Regarding the use of this 'generator' method for extending the scope of Stein's method 
to the Beta distribution, we recall that the Fisher Wright model from genetics, originating 
in the work in [11], [22] and [23], is a stochastic process used to model genetic drift in a 
population and has generator given by 

Af(x) = w(l - w)f"(w) + (a(l - w) - f3w)f'(w) (4) 

for positive a and /3, and that the B(a,(3) distribution is its unique stationary distribution. 
In particular, with Z ~ B(a,/3) we have EAf(Z) = 0. Let B afi h = Eh(Z), the B(a,P) 
expectation of a function h; we drop the subscripts when the role of the parameters a and 
(3 is clear. As Kh(Z) — Bh is zero, we are led to consider a Stein equation for the Beta 
distribution of the form 

w(l - w)f'(w) + (a(l - w) - (3w)f(w) = h(w) - Bh. (5) 

Lemma 12.11 provides a characterizing equation for the Polya urn distribution that is 
parallel to equation ([5l). Taking differences then allows us to estimate the expectation of the 
right hand side of (jSJ) when w is replaced by W n , exploiting the similarity of characteristic 
operators for use with Stein's method; a similar argument can be found in [10] and [13] for 
stationary distributions of birth-death chains. Recently [8] analysed a different Beta Stein 
equation, (1 — w 2 )g'{w) — (a + j3 — 2)wg{w) — (j3 — a)g(w) = h(w) — Bh. Some connections 
between [8] and the present work are discussed in Remark 13.11 

First we introduce some notation. We say a subset I of the integers Z is a finite integer 
interval if / = [a, b] D Z for a, b G Z with a < b, and an infinite integer interval if either 
/ = (—00,6] D Z or / = [a, 00) fi Z or / = Z. For a real valued function / let Af(k) = 
f(k + 1) — f(k), the forward difference operator, and for a real valued function p taking 
non-zero values in the integer interval / let 

ip(k) = Ap(k) /p(k) for kel. (6) 

For Z a random variable having probability mass function p with support an integer 
interval /, let Fijp) denote the set of all real- valued functions / such that either EA/(Z— 1) or 
E,if)(Z)f(Z) is finite, lim ri _ 5 .± 00 f(n)p(n+l) = 0, and in the case where a is finite, f(a— 1) = 0. 
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Lemma 2.1 Let p be the probability mass function of S n , the number of additional white 
balls drawn from the Polya 's urn at time n, with initial state a > 1 white and ft > 1 black 
balls. A random variable S has probability mass function p if and only if for all functions 

f e Hp) 

E [S(ft + n- S)Af(S - 1) + {(n - S){a + S) - S(ft + n - S)} f(S)} = 0. (7) 

We prove Lemma 12.11 by applying a general technique for constructing equations such as 
(J7J) from discrete probability mass functions which is of independent interest, see [13]. We 
begin with Proposition 12.11 below, a discrete version of the density approach to the Stein 
equation. 

Proposition 2.1 Let Z have probability mass function p with support in the integer interval 
I , and let tf)(k) be given by (0|) for k E I. Then a random variable X with support I has 
mass function p if and only if for all f G F{jp), 

E(Af(X-l) + iP(X)f(X)) = 0. (8) 

Remark 2.1 The statement in Proposition \2.1\ is equivalent to Theorem 2.1 given in flJ^ , 
under a different assumption, namely that the equation holds for all functions g (instead of 
f) for which g(x)p(x) is bounded and <7(inf{A; : p{k) > 0}) = 0. We note that their set-up 
would translate to test functions f(k) = g{k + 1), recovering the restriction f(a — 1) = in 
the case where a is finite. 

Proof: Let p be a real valued function defined in the integer interval [a, b + 1] fl Z for a, b G Z 
with a < b, and assume that p(b + 1) = and f(a — 1) = 0. Applying the summation by 
parts formula in the first line below, we obtain 



J2f(k)Ap(k) = -J2p(k + l)Af(k) + f(b+l)p(b+l)-f(a)p(a) (9) 

k=a 
b+1 

- P(k)Af(k - 1) + f(b + l)p(b + 1) - f(a)p(a) 

k=a+l 
b 

- Y,P(k)Af(k - 1) - f(b)p(b + 1) - f(a - l)p(a). 



k=a k=a 

b+1 



k=a 

Using that p(b + 1) = f(a — 1) = 0, we obtain the identity 

b b 
J2f(k)Ap(k) = - A/(fc - 1), (10) 

k=a k=a 

implying that (JSJ) holds when p is a probability mass function with support [a, b] fl Z and 
/(a-l) = 0. 

The case where / is an infinite integer interval follows by applying Abel's Lemma on 
summation by parts as modified by [51 Ej. In particular, if at least one of the series in is 
convergent upon replacing b by oo, and if lim^oo f{k)p{k + 1) = 0, then (3a) of [B] shows 
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that f fTUj) holds upon replacing b by infinity, completing the argument when / = [a, oo). 
Similarly, (3b) of [6] can be used to argue the case when / = Z. 

For / = (—00, b], with a < 6, a G Z, letting / a (^) = f(k)l(k > a), since p(6 + 1) = and 
p(a)f(a — 1) — 7-0 as a—)-— 00, applying ( flQj) yields 



EA/(X-1) = lim Vp(fc)A/(A;-r 

A— V — DO ' ^ 



lim (y;p(fc)A/ a (fc-l)-p( a )/(a-l) = lim V f a (k)Ap(k) = - V /(fc)Ap(fc) 

K k=a / fc=o fc=— 00 

= -E^(X)/(X) 



by the Dominated Convergence Theorem. 

Conversely, if X satisfies (jSJ) then for any £ £ I setting f(k) = l(k 
F(p), yields 

= E(A/(X-1) + ^(X)/(X)) 

= ^p(x = fc) {(/(*) - /(* - 1)) + mm) 



a function in 



kei 
= F(X = 

and rearranging gives 



-F(X = £ + 1) +F(X 



f¥(Z = £+l)-F(Z 



F(X 



f(z = e + i) 

¥(Z = £) 



As the successive ratios of the mass functions of X and Z are the same in /, and each sums 
to one over £ G I, we obtain P{X = £) = P(Z = £) for all £ G I, and hence for all £ G Z. □ 
Given a characterization produced by Proposition 12.11 the following corollary produces 
varieties of characterizations for the same distribution from choices of function c possessing 
certain mild properties. 

Corollary 2.1 Let Z be a random variable with probability mass function p having support 
in the finite integer interval I = [a, b] fl Z where a < b,a,b G Z ; and let ip be given by 
Let c: [a - 1, 6] fl Z -> R \ {0} 5e a function satisfying 



E(^(X)c(X)) 2 < 00. 



(11) 



T/ien in order that a random variable X have mass function p it is necessary and sufficient 
that, for all functions f G F{jp) such that E(/(Z) 2 ) < 00, 



E [c{X - l)Af(X - 1) + [c{X)^{X) + c(X) - c(X - l)]f{X)] = 0. 



(12) 



Proof: The Cauchy-Schwarz inequality shows that for / G jF{jp) such that E(/(Z) 2 ) < 00, 
we have cf G jF{]P)- The necessity now follows directly from the proposition by replacing 
f{x) by c(x)g(x) in (jSJ). For the sufficiency, the condition that c is non-zero on its domain 



5 



guarantees that we can replace f(k) = l(k = £) by f{k) = c(£) l l{k = £) in the proof of 
Proposition 12. II to obtain the assertion. □ 



We illustrate Proposition 12 . 1 1 and Corollary 12. II for the Poisson distribution V(X). By ([6]) 
we have ip(k) = \/{k + 1) — 1 for all k in the support of V(X), the interval Z> of nonnegative 
integers. Direct application of Proposition 12.11 yields that for all functions / £ J-"(V(X)), 



a nonstandard version of a characterization of the Poisson mass function. Corollary 12.11 
produces the usual characterization of the Poisson distribution by the choice c(x) = x + 1 
and then the substitution g(k) = f(k-l). We note that E(ip(Z)c(Z)) 2 = E(X-Z-l) 2 < oo, 
so this choice of c(x) satisfies (fTTT) . Naturally, additional characterizations are produced when 
using different choices of c. 

Remark 2.2 The square integrability condition < f77)j in Proposition \2.1\ is stronger than 
needed, and imposed for the convenience of applying the Cauchy Schwarz inequality, but 
suffices for our purposes here. Additionally, the assumptions on f in Corollary \2.1\ are 
sightly stronger than the ones for Proposition ^. 1\ in order to separate assumptions on c and 
on f. 

Note that when I = [a, fa] D Z or I = (— oo, b] D Z with a < b, a, b £ Z, when b is finite 
we automatically have ip(b) = —1, in which case c(b) does not appear in [T2}) . and may be 
assigned a value arbitrarily. 

We now apply Corollary 12.11 to the distribution of the number S n = S^ 13 of white balls 
added to the Polya-Eggenberger urn by time n, where the urn initially contains a white and 
(3 black balls. We suppress a and /3 for notational ease unless clarity demands it. It is well 
known, and not at all difficult to verify, that the distribution p^ = P(S n = k),k £ Z satisfies 



where (x)q = 1 and otherwise (x)k = x(x + 1) • • • (x + k — 1) is the rising factorial. This 
distribution is also known as the beta-binomial and the negative hypergeometric distribution, 
see [21] . We now have the ingredients to prove Lemma [2.11 

Proof of Lemma \2.1\ Taking differences in (TT31) yields for k — 0, . . . , n — 1 





(13) 



( 



n\ (a) k (P) n _ k 
k) (a + P) n 




while for k = n, 



A P ,, 



Hence with ip(k) = Apk/pk as in (J6j) we obtain for k — 0, . . . 



n 



- 1 



(n - k){a + k)-{k + l)(/3 + n — k — l) 
(k + l)(P + n- k- 1) 
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and ijj(n) = — 1. 

In applying Corollary 12.11 as ip{n) = — 1 we may take the value c(n) arbitrarily, see 
Remark 12.21 In particular, taking c(k) — (k + + n — k — 1) for all k — 0, . . . , n — 1 and 
c{n) = n we obtain (J7J). □ 

The next lemma is instrumental in calculating the higher moments of 5"° ,/3 . We let 
[x]o = 1, and otherwise set [x]k = x(x — 1) • • • (x — k + 1), the falling factorial. 

Lemma 2.2 For all nonnegative integers n, a and b, we have 

E ([^]«[n - S^]*) = [n]a+b j a l Mb . (14) 



Proof: First we show that both sides of (TBj) are zero when a + 6 > n + 1. This is clear for 
the right hand side, as the falling factorial [n] a+b is zero. For the left hand side, if S n < a — 1 
then [S n ] a = 0. On the other hand, if S n > a then b — 1 > n — a > n — S n , in which case 
[n — S n ]b is zero. 

Now assume n > a + b. For any k = 0, 1, . . . , n we have 

[fc]„[n-A;] 6 P(^ = A;) 

'n\ (a)fc(/3) n _fc 



\kj (a + p) n 
[n} a+b (a) a ((3) b fn-a-b\(a + a) fc _ a (/3 + b) n - k -b 



a + /3) a+b \ k - a J (a + f3 + a + 6) n -o- 
P(a + 5S?* = A ; ). 



b TDl „ i qa+a,/3+b 



(a + (3) a+b 

Summing over k = 0, 1, . . . , n and using that the support of S m is {0, ... , m} yields (jbf|) . □ 
If Z has the limiting beta distribution B(a, (3) with density ([I]), using ( Tl4l) we obtain 

F / [g w ] a [7i-g w ] 6 \ [n] a+6 (a) a (/3) ft [n] a+6 P(oi + a,/3 + 6) [n] a+fc , _ b , 
V ^ a+fe / n a + b (a + (3) a+b n a + b B(a,P) n a + b { { ] )K ' 

that is, the scaled falling factorial moments of S n and the power moments of Z differ only 
by factors of order 1/n. This observation can be used to provide a proof of convergence in 
distribution of W n = S n /n to Z by the method of moments, but without a bound on the 
distributional distance. 



3 Bounds for the Polya-Eggenberger urn model 

Theorem 13.11 provides an explicit bound, in Wasserstein distance, of order 1/n between the 
distribution of W n , the fraction of white balls added to the urn, and the limiting Beta. 
For approximating a discrete distribution by a continuous one the Wasserstein distance is a 
typical distance to use, see for example [T2], which, for measuring the distance between the 
laws of random variables X and Y, takes the value 

dwiX, Y) = sup \E[h(X)\ - E[h(Y)] | , 

/ieLip(l) 
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where Lip(l) is the class of all Lipschitz-continuous functions with Lipschitz constant less 
than or equal to 1. The Wasserstein distance defines a metric on the set of probability 
measures on a separable metric space. For measures on a measureable space, convergence 
under the Wasserstein distance implies weak convergence. For measures on metric spaces of 
bounded diameter, the Wasserstein distance metrizes weak convergence, see [12]. In what 
follows we let x Ay and x V y denote the minimum and maximum of two real numbers x and 
y, respectively. 

Theorem 3.1 For a > 1 and > 1 let S n be the number of additional white balls obtained 
in n draws of a Polya-Eggenberger urn that initially contains a white and (3 black balls. Then 
with W n = S n /n and Z ~ £>(a, (3), 

1. When {a,/3} C (l,oo) 

d w (W n ,Z) < -!- \l + a l (a + f3) + (a 1 + a 2 ) (l + a V /3 + 



2n { y ' v ' \ a + (3 

where 

a + /3-2 (q + /3-2)(2(aA/3)-l) 

a-i = ana ao = -r, ^ tt; lo 

1 a A/3-1 ((aA/3)-l) 2 y ! 



2. If a > 1 and (3 = 1 then 



d w {W n ,Z) < — (3 + 23a + 7a 2 ) 
2n 



and a = 1 and j3 > 1 then 



d w (W n ,Z) < — (3 + 23/3 + 7/3 2 ) 



3. If a = (3 = 1, 



d w (W n ,Z) < — . 

n 



Remark 3.1 In Theorem 4-3 of f$ a bound of order 1/n without explicit constants is ob- 
tained for the Beta approximation to the Polya-Eggenberger Urn for test functions with 
bounded first and second derivatives by using an exchangeable pair coupling. To our knowl- 
edge the bound in Theorem \3.1\ here of order 1/n with small, explicit constants for the Wasser- 
stein distance is new. 

The 1/n order of the bound in Theorem \3.1\ cannot be improved. Taking h(x) = x(l — x) 
for x G [0,1], a function in Lip(l), from ffTol) with a = 1,6 = 1 we obtain that for all 
a > l,/3 > 1, 

d w (W n , Z) > \E(h(W n )) - Eh(Z)\ = 



n- 



1 E(Z(1 - Z)) 



1 a/3 
n (a + /3) ; 



S 



In the following we set our test functions h to be zero outside the unit interval [0, 1]. For 
y > set 

A y f(x) = f{x + y)-f(x), 

and for a real valued function g on [0, 1] we let \\g\ \ = sup^Q^ |<7(iw)|, the supremum norm 
of g. In the following we recall that, with the help of Rademacher's Theorem, a function h 
is in Lip(l) if and only it is absolutely continuous with respect to Lebesgue measure with 
a.e. derivative bounded in absolute value by 1. 
Lemma [3.11 below shows that 



w a (l - w)P J 



It 



a-1. 



l-uf- l {h{u)-Bh)du, we [0,1] 



;i7) 



is the unique bounded solution of the Stein equation (jSJ). In the proof below, we will also 
invoke Lemmas I3.3[ 13.41 and 13.51 which yield the required properties of /. 



Proof of Theorem 13.11 Let / be the solution of the Stein equation (jSJ) given in (p3 
for {a, f3} C (1, oo) and h a given function in Lip(l). Replacing f(z) by f(z/n) and dividing 
by n in (JTj) results in 







ES r , 



n 



(3 + l-W n \ A 1/n f W n - 



n 



ii 



+E<(n- S n ) [-a + W n ] - S n ( -0 + 1 -W n ) > f(W n 



n 



E 



nWr, 



n 



(3 + 1 - W n \ A 1/n f W n 



n 



{a{l-W n )-pW n }f{W n 



Applying this identity in the Stein equation (JSJ), and invoking Lemmas I3.3[ 13.41 and [3751 
below to yield the existence and boundedness of we obtain 

Eh(W n ) - Eh 

= E (W n (l - W n )f'(W n ) + [a(l - W n ) - /3W n ]f(W n )) 
= E \W n (l - W n )f'{W n ) - nW n fy+l- w}j A 1/n f (w n - ^ 



E W n (l - W n )f'{W n ) - nW n (1 - W n ) A 1/n f [W n - - ))+ R 



n 



where, using Lemma [2.21 to calculate moments, we obtain 



\Ri\=P 



EW n A 1/n f [W n -- 



n 



a/3 



<P\\f\\EW n = 

n n(a + (3) 



wn 



(19) 
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Writing the difference in (JT8J) as an integral, we have 

E (w n {l - W n )f'(W n ) - nW n (1 - W n ) A 1/n f (w n - ij 

= EW n {l-W n ) \f{W n )-n [ n f'(x)dx) 



nE 
nE 



Wn 

w n -l 



W n (l-W n )(f(W n )-f'(x))dx 



/ (W n (l - W n )f'(W n ) - x(l - x)f(x))dx + R 2 , 
Jw n -± 



(20) 



where 



R 9 = nE 



(W n (l -W n ) - x(l - x))f(x)dx. 



Wn 



To handle R2, 



\Ri 



nE 



Wn 



Wn 



Wn 



f'(x) (1 - 2y)dydx 



< WfWnE 



Wn 



dydx 

'\\nE / (W n -x)dx 
Jw n -h 



1 

2n 



(21) 



For the first term in (|20|) . substituting using the Stein equation ([5]) and then integrating 
by parts yields 



Wn- 1 



(W n (l - W n )f(W n ) - x(l - x)f'(x))dx 



■nE 
nE 



w„ 

W n 



{h{W n ) - h{x) + ((3W n - a{\ - W n ))f(W n ) - (Px - a{\ - x))f{x)} dx 



Wn 



h'(y)dy 



11, 



'Wn- 1 Ux 

We bound the inner integrals separately. Firstly, 



[(Py -a(l- y))f{y) + (/? + a)f(y)} dy } dx. 



nE 



Wn l-Wn 



Wn- 1 



h\y)dydx 



< n 



fW n 1 I 

\h'\\E / (W n -x)dx = -\\h'\\<-. 
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Next, recalling that < W n < 1 and noting that \(Py - a(l - y))\ < a V P for < y < 1, 



riE 



nE 



W n 



< n||/'||(aV/?)E 
1 



(J3y-a(l-y))f'(y)dydx 



(Py-a(l-y))f(y) 



W n ry 

Wn~- JWn-- 

' u 11 '" 11 



dx dy 



dx dy 



2n 



\\f\\(<*VP). 



Arguing in a similar fashion, we also obtain 

Wn rW n 



nE 



Wn~- 



(P + a)f(y)dydx 



<n(^ + a)||/||E 



Wn ry i 

/ dxd y =— (a+mnm 



Collecting the bounds (EE]) and Q2TJ) through Q22]) yields 



\Eh(W n ) -Eh(W)\ < 



1 + a V 
2n 



+ 



aP 



n(a + P) 



ll/'l 



1 1 

2n 2n 



The three cases of the theorem can now be demonstrated by invoking Lemmas I3.3[ 13.41 and 
I3.5[ respectively. □ 

Lemma 3.1 For any {a,P} C [l,oo) and real valued function h on [0,1] such that the 
expectation Eh of h with respect to B(a,P) exists, the function f given by |7?p is the unique 
bounded solution of §B\). 

Proof: It is straightforward to verify that ( TTTT) solves ((5]). Writing the associated homogeneous 
equation as 

(w a -\l - wf- 1 )' 1 ^! - wff{w))' = 



we find that all solutions to (EJ) are given by 

c 



c e 



(23) 



□ 



w a (l - w)^ 

As the second term in (|2"3"|) is unbounded at the endpoints of the unit interval for all c ^ 0, 
Lemmas I3.3[ 13.41 and [3751 below, demonstrating the boundedness of f(w), show that fTl7|) is 
the unique bounded solution to ([5]) for all {a, P} C [1, oo). 

Since the expectation of h(Z) — Bh is zero when Z ~ B(a, P), we may also write 

/H = £ ^ Q " 1 ( 1 - ^(M^) - Bh)du. 

For any function g on the unit interval, let g(x) = g{l — x), and similarly for a value c(a, /3) 
depending on a and /3, let 

c(a,/3) = c(P,a). (25) 
Writing the solution (fTT|) more precisely as fh,a,p{x) we have the following simple fact. 



(24) 
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Lemma 3.2 If h is a function on [0,1] such that B a ,ph exists, then 

Bp !( Ji= B a ,ph and f^ a (x) = -fh,a,p{ x )- 

The proof is omitted; the two facts are easily shown with the help of the change of variable 
v — 1 — u, and the expressions (JTTjl and (124j) for the solution. 

When h is absolutely continuous with almost everywhere derivative h', with p(y; a, 0) 
denoting the B(a, (3) density, we have 

\\h — Bh\\ = sup \h(x) — Bh\ < sup / \h(x) — h(y)\p(y; a, /3)dy 
xe[o,i] xe[o,i] Jo 

< \\h'\\ sup / \x — y\p(y; a, /3)dy < \\h'\\. (26) 
xe[o,i] Jo 

Our next result bounds the magnitude of the solution / in terms of h when both a and (3 
are greater than 1. 

Lemma 3.3 Let {a, (3} C (l,oo) and let f be the solution to ([5]) given by ( TT7T) for a given 
function h, and let a\ and 02 be as given in [To]) . When h is bounded then 



<ai||/i-B/i||, (27) 
and if in addition h is absolutely continuous then 

<ax\\h'\\ and \\f'\\<a2\\h-Bh\\+a 1 \\h / \\<(a 2 + a 1 )\\h / \\. (28) 



Proof: First, by replacing h by h — B a ^h we may assume B a ^h = 0, and therefore, by Lemma 
13. 2\ that Bp i0l h = 0. Next, it is straightforward to verify that for {a, (3} C (l,oo) the Beta 
density increases up to its unique mode at 

a — 1 , 

and decreases thereafter. In particular, since the density is increasing on x G [0, x aii g], for 
such x we have 

< —^—J X u a -\l-uY^\\h\\du 



x a (l - xY J 

- x-a-xy L lMdu 



X 



x(l — x) 



\\h\\ \\h\\ , . 

< ^UL_ = lUi . (30) 

1 — X 1 — X a: p Xp iCt 

Now consider x G [x a ^, 1], for which 1 — x G [0, Applying Lemma 13.21 and the 

bound ( 1301) with the roles of a and (3 reversed yields 



\h,*Ax)\ = \h,^)\ = < =T^ = ^ (31) 
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where in the final equality we have applied definition fl25|) to x a ,p in ( 129]) . The proof of ((2 
is completed by noting 



< (a + (3 - 2) max 



1 1 a + (3 - 2 



a-r/3-1 J 11 " a A/3 - 1 



When h is absolutely continuous, the first inequality in ( 128]) follows from ( )27|) and ( 1261) . 
By the Stein equation ([5]), 

x(l - = + (fix - a{l - x))f(x), (32) 

so to show the second inequality in (128]) it suffices to demonstrate that for all x G [0, 1] 

\h{x) + (fix - a(l - x))f{x)\ < {a 2 \\h\\ + ai]|/V||)x(l - x). (33) 

We first prove that a bound of the form (133]) holds for all x G [0, x a> p\. In fact, we show that 
for all x G [0, x a ^] 

\/3xf(x)\ < 6i||/i||x(l -x) and - a(l - < (&2||/i|| + 6 3 ||^HMl - x) (34) 

where, with x a> p as in (1291) . 



& 1 = T, vJ> & 2 = T, ^ and & 3 - 



(l-X a>/3 ) 2 ' (l-X Qi/3 ) 2 1" 

To begin to prove the first inequality in ( l34j) note that by ( 130]) we have 



|/3x/(x)| < 0x " " < 6i||/i||z(l -z) for all z G [0,z a s]. 
For the second inequality in ( 134]) we will apply the identity 

au7 a " 1 (l-w;) /3 - 1 /i(w;)rfu; (35) 

= x a {l-xf- l h(x)+ [ w a {{f3-l)(l-wf- 2 h{w)-{l-wY- 1 h\w))dw 

Jo 

which may be obtained by a simple integration by parts. Using ( ITT]) and ( 135]) we have 

h(x) - a(l - x)f{x) = h(x) - xa{l ]_ x y^ £ aw a -\l - wf- l h{w)dw 

= xa{1 l x) ^i ("(Z 3 " !) [ ~ ™y~ 2 h(w)dw + jT w a (l - wf- 1 h'(w)dw ) j . 

As x a ,/3 < Xc+i^i, the function x a {l — wY~ 2 is increasing over [O,^^], so for x in this 
interval we have 



w a (l- wf- 2 h{w)dw 



o 



< x a {l -xf~ 2 / = x a+1 (l-x) /3 - 2 
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and similarly for x G [0, we obtain 

w a (l - wf- 1 h'(w)dw 



< x a+1 (l -x^Wh'l 



Hence for all x G [0, x, 



a,PU 



\h(x)-a(l-x)f(x)\ < 



1 — X 

< 



x + I \h'\\x 
(3-1 



1 — x 



+ \\h'\ 



x 



+ b 3 \\h'\\)x(l-x) 



Hence inequality (1341) holds in the interval [0,x a ^], so by the triangle inequality fl33|) holds 
on [0, x a> p] with a± and a 2 replaced respectively by 

a-2,L = bi + b 2 and a x>L = b 3 . 

In particular, as (2/3 — — l) 2 is decreasing for > 1, 

2/3-1 (2/3-l)(a + /3-2) 



lh + b 2 



< a 2 . 



W (P ~ I) 2 

Hence, by (152]) . 

l/ft,a^( x )l < a 2,L||^|| + ai,L||/i'|| for all xG [0,x Qj/9 ]. 
Now consider x G [x a> p, 1]. Using Lemma [3.21 and arguing as in (l3"Tj) we have 



(36) 



+ ai L ||/i || = a 2 ,i 



aixll/i'll. 



Hence (136)) holds for all x G [0,1] with a 2 = ma.x{a 2t i, 0*2,1,}, an d a i — m ax{oi,L, o>i,l} as 
claimed, thus verifying the first bound on the derivative in (1281) ; the second bound now 
follows bv (I26]) . □ 

The cases where either a or /3 take the value one need to be given special attention. 

Lemma 3.4 For (3 = 1 and all a > 1, for any h such that B a ^h exists, the solution f to 
the Stein equation (jSJ) satisfies 

(37) 



< 2\\h - B a<1 h\\. 
If in addition h is absolutely continuous 

<2\\h'\\ and \\f'\\ < ba\\h - B a ,ih\\ + 



2a 
a + 1 



\h'\\ < 7a\\h'\\. 



(3* 



When a = 1 and (3 > 1 i/iese same statements hold with the roles of a and (3 reversed. 
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Proof: As before, replace h by h — Bh and assume Bh = 0. First consider the case /3 
l,a > 1. In this case, the density function of the B(a,/3) distribution, and the solution (|T 
to the Stein equation specialize, respectively, to p(x; a, 1) = ax a ~ l and 



w a (l — w) Jo 
Letting x a = 1 — l/(2a), for w G [0, x a ] we obtain 
1 



u a h{u)du. 



(39) 



l/HI 



w Q (l - w] 



u a - l h{u)du 



< 



w a (l - X a ) Jo 



u a - x du 



ail — x r 



2\\h\ 



Defining the function (1 — w a )/(l — w) to be continuous at w = 1 by taking the value a, 
since g(w) = 1 — w a — a(l — w) is increasing on [0, 1] with terminal value g(l) = 0, we obtain 



1 - w a 
1 — w 



< a for all w e [0, 1]. 



Hence, when w G [x a , 1], we obtain, with (JM 



l/HI < 



w a (l - w) 



u a h{u)du 



< 



u a ~ x du 



ax L r 



l-w a \ \\h\. 

; < U_li < 2\\h\ 

1 — w 



since (1 — l/{2a))~ a equals 2 at a = 1 and decreases for a G [1, oo). Hence inequality (I3T1) 
is shown, and the first inequality in (1381) now follows by (j26p . 

Turning to the bound on f'(w), the Stein equation (jSJ) specialized to this case becomes 

w(l — w)f'(w) = h(w) + (w — a(l — w))f(w), 

so for the remaining claim in fl38|) it suffices to demonstrate that for all w G [0, 1] 

\h{w) + {w- a(l - w))f(w)\ < (a 3 \\h\\ + a 4 \\ti\\)w(l - w) (40) 

with 03 < 4a and < 2a/ (a + 1). 

Again, let x G [0,x a ]. We may bound wf(w) by 



|to/(w)| < w 



< 



x r 



-w(l — w 



2aw(l — w 



< 4a I \h\ \w(l — w) 



applying (137|) for the final inequality For the remaining difference h(w) — a(l — w)f(w) 
(|39|) and integration by parts produces 



\h(w)-a(l-w)f(w)\ 



h(w) / au a h{u)du 

w a Jo 



u a h'(u)du 



< 



w 



\h'\\ < 



a 



-w{l-w)\\ti\ 



2a 
a + 1 



w{l-w)\\ti\ 
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Hence (HUjl holds on [0, x a ] with a 3 and 04 respectively replaced by 

, 2a 
a 3 l = 4a and a 4 ^ 



a + 1 



On [x a , 1], using now that x a 1 < 2 for a > 1, invoking (|37|) we have 

a 2a 
|a(l-w)/(«;)| < —w(l-w)\\f\\ < —\\h\\w(l-w) < 4a| \h\ \w(l - w) 



and, by fl39l) . with ^(w) = w; a 1 h(w), 

1 



|/i(w) + w/H| 
1 



iw a_i (i - w) y, 



- g{u))du 

w a 1 {1 -w) J w 
As a > 1 for w G [x a , 1] we have that 
\g'(w)\ < (a-l)w a - 2 \ 



< 



u a h(u)du 
1 



iw a-1 (l - w ) 



\g 



+ w 



a-1 1 1 U> 



ti\\ < C r 



(u — w)du 



+ \\h'\ 



1^11(1-^) 



2w 



a-l 



where 



noting that 



, = (a-l)(2 1(l<a< 2)) + l(a > 2)) 
w a ~ 2 < x°- 2 l(l < a < 2)) + l(a > 2), 



and that x° 2 takes the value 2 at a = 1 and then decreases. Hence, for w G [x a , 1], using 
that l/i" < 2 we obtain 



\h(w) + wf(w)\ < 



1 — w 



2w 



a-l 



< 



2x> 



a-l 



\h'\\)(l 



w 



< 



2x c r 



(Call/lH + \\ti\\)w{\ -W) < (C a \\h\\ + \\h'\\)w(l-w). 



Hence ( HOI) holds on [x a , 1] with a-s and a 4 respectively replaced by 

a 3,R = 4a + c a = 5a — 1 + (a — l)l(a G (1, 2)) < 5a and 04^ = 1 < 2a/(a + 1). 

These upper bounds complete the proof for the case where a > 1 and (3 = 1. The case where 
a = 1 and /3 > 1 can now be handled using Lemma 13.21 □ 
We now handle the case where both a and (3 are equal to 1. Though it may be the 
case that the argument of Lemma 13.41 may apply here, the constants when specializing are 
superior. 

Lemma 3.5 The solution f given in [T 7 } ) to the Stein equation (TJJ) for the case where both 
a and /3 are equal to one satisfies 



< 2\\h - Bh\\, 

and when h is absolutely continuous the solution f is differentiable and 
<2\\h'\\ and \\f'\\ < A\\h - Bh\\ + \\h'\\ < 5\\h'\\. 
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Proof: Replace h by h — Eh and assume Bh = 0. Substituting a = 1, (3 = 1 into (j5]) yields 

w(l - w)f(w) + (1 - 2w)f(w) = h(w), 
while the solution f|T7|) specializes to 

1 



h(u)du. 



w(l-w) Jo 

For w G [0, 1/2], and w G [1/2, 1] respectively, we have 



\h 



\f(w)\<j LJL <2\\h\\ and |/H|< 



wfl — w] 



\h(u)\du < 



w 



< 2 



thus showing the first claim on the lemma. The second claim follows from the first using 
d26]). 

For the final claim, write 

w(l — w)f'{w) = h{w) — (1 — 2w)f(w) = h{w) — (1 — w)f{w) + wf(w). 
On [0, 1/2], for the difference we have 



\h(w)-(l-w)f(w)\ 



'a 



{h(w) — h{u)) du 



1 f w 

h(w) / h{u)du = — 

w J w 

\\h'\\ r, ^ , \\h'\\w ,,,„, M 

< / (w — u)du = < \\n \\w[l — w). 

w Jo 2 



For the last term, using the first bound on the lemma, 

\wf(w)\<w\\f\\<2w\\h\\<4w(l-w)\\h\\. 
On [1/2, 1] we obtain the same bounds by 

|(1 - iw)/(iy)| < 2(1 - w)\\h\\ < 4w{l - w 

and 



\h(w)+wf(w)\ 



h(w] 



1 — w 



< 



1 — w 



\h(w) — h(u)\du < 



h{u)du 
\\h!\\ 



1 — w 



(h(w) — h{u))du 



\h 



{u — w) = (1 — w) < \\h'\\w(l — w). 



□ 
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